-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(bug): txtar to reproduce the VM hanging during Render #1736
base: master
Are you sure you want to change the base?
Conversation
e0f3370
to
57953f1
Compare
Signed-off-by: Jeff Thompson <[email protected]>
…args Signed-off-by: Jeff Thompson <[email protected]>
57953f1
to
2f78ae2
Compare
I'm still investigating, but my preliminary findings is that this is actually occurring while attempting to do |
Okay, as I was mentioning in the last comment this has to do with printing "very recursive" values like the avl trees you're working with. If we apply this patch, which essentially forces a maximum depth of 8 when printing values, we get to see the "true" error: diff --git a/gnovm/pkg/gnolang/values_string.go b/gnovm/pkg/gnolang/values_string.go
index 34187e32..86274b19 100644
--- a/gnovm/pkg/gnolang/values_string.go
+++ b/gnovm/pkg/gnolang/values_string.go
@@ -11,19 +11,22 @@ type protectedStringer interface {
ProtectedString(*seenValues) string
}
-// This indicates the maximum ancticipated depth of the stack when printing a Value type.
-const defaultSeenValuesSize = 32
+const seenValuesSize = 8
type seenValues struct {
- values []Value
+ values [seenValuesSize]Value
+ len int
}
+func (sv *seenValues) Full() bool { return sv.len == seenValuesSize }
+
func (sv *seenValues) Put(v Value) {
- sv.values = append(sv.values, v)
+ sv.values[sv.len] = v
+ sv.len++
}
func (sv *seenValues) Contains(v Value) bool {
- for _, vv := range sv.values {
+ for _, vv := range sv.values[:sv.len] {
if vv == v {
return true
}
@@ -41,11 +44,11 @@ func (sv *seenValues) Contains(v Value) bool {
// being printed if the value has already been print
// - this is NOT recursion and SHOULD be printed
func (sv *seenValues) Pop() {
- sv.values = sv.values[:len(sv.values)-1]
+ sv.len--
}
func newSeenValues() *seenValues {
- return &seenValues{values: make([]Value, 0, defaultSeenValuesSize)}
+ return &seenValues{}
}
func (v StringValue) String() string {
@@ -72,6 +75,9 @@ func (av *ArrayValue) ProtectedString(seen *seenValues) string {
if seen.Contains(av) {
return fmt.Sprintf("%p", av)
}
+ if seen.Full() {
+ return "<trimmed>"
+ }
seen.Put(av)
defer seen.Pop()
@@ -104,6 +110,9 @@ func (sv *SliceValue) ProtectedString(seen *seenValues) string {
if seen.Contains(sv) {
return fmt.Sprintf("%p", sv)
}
+ if seen.Full() {
+ return "<trimmed>"
+ }
if ref, ok := sv.Base.(RefValue); ok {
return fmt.Sprintf("slice[%v]", ref)
@@ -134,6 +143,9 @@ func (pv PointerValue) ProtectedString(seen *seenValues) string {
if seen.Contains(pv) {
return fmt.Sprintf("%p", &pv)
}
+ if seen.Full() {
+ return "<trimmed>"
+ }
seen.Put(pv)
defer seen.Pop()
@@ -154,6 +166,9 @@ func (sv *StructValue) ProtectedString(seen *seenValues) string {
if seen.Contains(sv) {
return fmt.Sprintf("%p", sv)
}
+ if seen.Full() {
+ return "<trimmed>"
+ }
seen.Put(sv)
defer seen.Pop()
@@ -204,6 +219,9 @@ func (mv *MapValue) ProtectedString(seen *seenValues) string {
if seen.Contains(mv) {
return fmt.Sprintf("%p", mv)
}
+ if seen.Full() {
+ return "<trimmed>"
+ }
seen.Put(mv)
defer seen.Pop()
@@ -286,6 +304,9 @@ func (tv *TypedValue) ProtectedSprint(seen *seenValues, considerDeclaredType boo
if seen.Contains(tv.V) {
return fmt.Sprintf("%p", tv)
}
+ if seen.Full() {
+ return "<trimmed>"
+ }
// print declared type
if _, ok := tv.T.(*DeclaredType); ok && considerDeclaredType { The "true" error:
(The VM encounters this error, then attempts to dump the machine state for debugging into the error message, and thus "hangs" while trying to print the state). As for the solution, I'm not terribly sure. What I've thought of:
I'm not sure how/if the last part can be implemented in a nice and efficient way. cc/ @deelawn @mvertes @ajnavarro as you probably have ideas. |
@thehowl IMHO, If we really want to do what |
Yeah. What would be best is to implement a way to print an actual stacktrace (of the Gno code alone) so we can use that to debug in place of the machine state dump :/ |
…uote on args" This reverts commit 2f78ae2.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1736 +/- ##
==========================================
- Coverage 60.85% 60.73% -0.12%
==========================================
Files 563 563
Lines 75193 74979 -214
==========================================
- Hits 45760 45541 -219
- Misses 26065 26078 +13
+ Partials 3368 3360 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
From the review meeting: the agreed solution is to switch from doing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for isolating this in a txtar 🙏
I've left a small comment on how we can include this txtar, but not run it until we fix it (have it as a regression)
@@ -0,0 +1,64 @@ | |||
loadpkg gno.land/p/demo/avl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run this txtar conditionally, so it doesn't hang in the local suite, but is present in the codebase? cc @gfanton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not have failing tests in the codebase unless we have a system that expects them to fail and checks that they are still failing
Ie. what we should do for the "challenges" directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the "bug" label so that this PR gets attention as demonstrating a bug.
This PR adds
cpu-cycle-overrun1.txtar
in gno/gno.land/cmd/gnoland/testdata . To run it, in gno/gno.land/cmd/gnoland enter:The call to Render hangs. You can see that the Render function is not very complicated, but some combination of avl.Tree and pointers seems to hang the GnoVM. We originally observed the problem in GnoSocial, but this code has been simplified as much as I can.
As explained in the file, if you comment out the final call to
AddData
, then Render returns quickly. So it is not a problem with too much data in memory. My wild guess is that there is indeed a CPU cycle overrun, but it hangs instead of quitting and returning the "CPU cycle overrun" status.